Method for improving a timing margin in an integrated circuit by setting a relative phase of receive/transmit and distributed clock signals

ABSTRACT

An embodiment of the invention includes an apparatus that has a first clock on a memory controller hub that is set to a first clock receive time and a second clock on the memory controller hub set to a first clock transmit time. A first data is sent from the memory to the memory controller hub. A second data is sent from the memory to the memory controller hub wherein the second data is checked. At least one of the first clock and the second clock has at least one of a second clock receive time and a second clock transmit time adjusted.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The invention relates generally to computers and more particularly tosystem boards and computer chips.

2. Background Information

Since the advent of computers, computer scientists and engineers havestrived to make computers operate faster. One feature of the computerthat has remained critical is the time that it takes for data to betransmitted from one component to another component located on thecomputer board. For example, data may be transferred from the memory tothe processor. To transfer data at high speeds and with fidelity, thedata transfer must be coordinated in time with the clock signals. Clocksignals determine when a data signal is sent and received. If the datasignal is sent too early or too late or if the data is received tooearly or too late, the data may become corrupt. This is commonlyreferred to as excess clock-data skew.

A computer board solution is not feasible because the correct receiveclock time (RCLK) of data and the correct transmit clock time (TCLK) ofdata may vary depending upon the computer board manufacturing variation.Therefore, what is needed is a way of checking the timing of the signalson the computer board.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating a system board in accordance withone embodiment of the invention.

FIG. 2 is a block diagram illustrating the flow of data from the memorycontroller hub to the memory and the data flow from the memory to thememory controller hub in accordance with one embodiment of theinvention.

FIG. 3 is a flow chart illustrating TCLK register in accordance with oneembodiment of the invention.

FIG. 4 is a flow chart illustrating RCLK register in accordance with oneembodiment of the invention.

FIG. 5 illustrates a flow chart in accordance with one embodiment of theinvention.

FIG. 6 is a block diagram illustrating the memory in connection with thememory hub controller in accordance with one embodiment of theinvention.

FIG. 7 is a graphic representation of the clock pulse generated by theDRCG chip on the system board in accordance with one embodiment of theinvention.

FIG. 8 shows a differential sine wave at 180 degree phase in which datais launched in accordance with one embodiment of the invention.

FIG. 9 shows clock pulses in which data is launched in accordance withone embodiment of the invention.

FIG. 10A illustrates a memory controller hub connected to a directchannel.

FIG. 10B illustrates a memory controller hub connected to a plurality ofchannels.

FIG. 10C illustrates a memory controller hub connected to a channel.

FIG. 11 shows a clock crossing signal in accordance with one embodimentof the invention.

DETAILED DESCRIPTION OF THE INVENTION

The following detailed description and the accompanying drawings areprovided for the purpose of describing and illustrating presentlypreferred embodiments of the invention only, and are not intended tolimit the scope of the invention in any way.

One embodiment of the invention relates to two clocks located on thememory and two clocks located on the memory controller hub. The time toreceive data (RCLK) and the time to transmit data (TCLK) for clock 1 andclock 2 on the memory are set to zero and RCLK and TCLK areautomatically established for memory controller hub (MCH) at the optimumtime periods after the configuration of the system board is checked. Theinformation presented below provides a general to or more detaileddescription of various aspects of several embodiments of the invention.

FIG. 1 is a block diagram illustrating a system board 10 of oneembodiment of the invention. System board (e.g., computer board) 10includes MCH 20 coupled to processor 30, master clock 70, storage device(or also referred to herein as memory) 80, direct Rambus clock generatorsuch as a clock generator available from Rambus Inc. of Mountain View,Calif. (DRCG) clock 90, hard disk 60, read-only memory (ROM) 50, andchip 40. MCH 20 controls the data flow in system board 10.

Because MCH 20 and storage device 80 are devices that are morefrequently used to describe the invention, these devices are describedin greater detail followed by a brief description of the other deviceson system board 10. MCH 20 is configured to send and receive data to andfrom storage device 80. Controller 20 generally operates such thatmemory MCH 20 sends a first data to storage device 80 wherein the firstdata is initially stored in a buffer. The first data is then returnedfrom storage device 80 to MCH 20. Processor 30 then checks this seconddata against the first data it sent to storage device 80. If the seconddata is considered “good”, then the data is considered to “pass” and thepassing value assigned to the second data is “1”. Data or the clock biasvalue (e.g.,) a clock bias generally exists when a normal clock pulseand an internal clock is shifted with respect to the time of theexternal clock) is considered “good” when the first data matches thesecond data. Data that is received from storage device 80 that ischecked by processor 30 and determined to be not “good” data is set tozero and is subsequently used to help determine the RCLK and the TCLKfor MCH 20. This operation is described in greater detail in FIGS. 3-4.

It will be appreciated that storage device 80 includes all types ofmemory such as storage device 80 may include read only memory (ROM),Synchronous dynamic random access memory (SDRAM), double data raterandom access memory (DDRAM), magnetic disk storage mediums, opticalstorage mediums, flash memory devices, and/or other machine-readablemediums. Storage device 80 has stored therein data 82 and computerprogram 84. Data 82 represents data stored, for example, in one or moreof the formats described herein. Computer program 84 represents thenecessary code for performing any and/or all of the techniques describedherein. It will be recognized by one of ordinary skill in the art thatthe storage device 80 preferably contains additional instruction logic(e.g., computer programs), which is not necessary to understanding theinvention. Storage device 80 is coupled to MCH 20 through bus (or alsoreferred to herein as a channel) 22. Preferably storage device 80 is aRambus dynamic random access memory (RDRAM) manufactured by Rambus, Inc.of Mountain View, Calif., since RDRAM offers transfer rates of around1000 megabits per second (Mbps).

The descriptions of the remaining devices on system board 10 areprovided below. Processor 30 represents a central processing unit of anytype of architecture, such as complex instruction set computer (CISC),reduced instruction set computer (RISC), very long instruction word(VLIW), or hybrid architecture. In addition, processor 30 could beimplemented on one or more chips.

Chip 40 includes circuits that receive input from mouse 42 and controlmonitor 44.

Read-only memory (ROM) 50 is a type of data storage device that hascomputer program(s) and the contents of ROM 50 generally cannot bealtered. Hard disk 60 may include one or more rigid magnetic disksdivided into a number of evenly spaced concentric circular tracks thatmay be used to store information. Master clock 70 controls processor 30and MCH 20. Additionally, master clock 70 is generally used tocoordinate through clock cycles of each communication transported withinsystem board 10. A clock cycle is used herein to describe one period ofa computer clock.

DRCG clock 90 generally serves the purpose of controlling the timingbetween devices such as MCH 20 and storage device 80. DRCG clock 90accomplishes this task by sending out clock pulses that oscillate backand forth. The clock pulses indicate that data will be transmitted. Theclock pulses also trigger the time at which the data is sent between MCH20 and storage device 80.

In FIG. 2, MCH 20 is coupled to DRCG clock 90 and to storage device 80.It will be appreciated that each chip has at least two clocks: a clock 1and at least a clock 2. DRGC clock 90 sends a clock signal to MCH 20indicating that data is to be sent from MCH 20 which then goes tostorage device 80 to indicate it will be receiving data. The clocksignal then terminates at termination point 100. The clock signal ispassively received by all chips and is used to determine RCLK and TCLKtimes. DRCG clock 90 then sends a second clock signal to MCH 20indicating that MCH 20 must send the data to storage device 80.

In order to implement the techniques described herein to achievesynchronization of the clock signal and data transfer, the phaserelationship between the clock and the data must be adjusted. There areat least two adjustments that occur in one embodiment of the invention.The first adjustment such as the TCLK adjustment occurs when data issent from the memory to MCH 20. The second adjustment such as the RCLKadjustment occurs when the receiver of the data such as storage device80 expects to receive the data.

In one embodiment of the invention, the clock bias for clock 1 and clock2 of storage device 80 may be set to zero. Therefore, only TCLK and RCLKclock 1 and clock 2 on MCH 20 need to be adjusted or changed. However,it will be appreciated that RCLK and TCLK may be adjusted on a varietyof devices such as RDRAM. In this case, clock 1 and clock 2 on MCH 20 isset to zero and RCLK and TCLK are for each RDRAM is adjusted.

TCLK Adjustment

FIG. 3 illustrates one embodiment of the invention wherein TCLK bias isset to its lowest value and RCLK is set to zero bias at operation 200.At operation 210, a 10-kilobyte memory test is executed using the valuesthat have been established by the program logic implementing techniquesof one embodiment of the invention. At operation 220, one feature of theprogram logic used herein is that if the “memory passes” (e.g., if thefirst data sent from MCH 20 to storage device 80 and the second datasent from storage device 80 to MCH 20 is such that the second data isconsidered “good”) then at operation 230 the “memory passing” is savedinto storage device 80. “Memory passing” is the value assigned data thatis “good” or not “good”. Data that is considered “good” is assigned thevalue of “1” and data that is not “good” is assigned the value of “0”.In this manner, Matrix 1 and Matrix 2 described below are filled with“1”'s or “0”'s which in turn determines the optimum RCLC and TCLK times.The lowest passing value is referred to as TCLK_pass_low and the highestpassing value is referred to as TCLK_pass_high. At operation 240, thisprocess is repeated for all TCLK bias values. It will be appreciatedthat the values change as the TCLK bias is changed to a lowest valuedifferent than previous TCLK values or substantially all TCLK biasvalues. This allows a matrix such as that shown in Matrix 1 to becompletely filled in as in Matrix 2. The TCLK bias value is then set atthe TCLK value that is closest to (TCLK_pass_high+TCLK_pass_low)/2. Atoperation 250, the process is terminated when a termination criterion orcriteria is met. The termination criterion or criteria is established byeither a user or a system designer. It will be appreciated that the RCLKadjustment is a dynamic process. Accordingly, the RCLK adjustment andthe TCLK adjustment described below may be started every millisecond orother time period that a system is operating. Additionally, it will beappreciated that the RCLK adjustment and the TCLK adjustment may occurat about the same time.

RCLK Adjustment

FIG. 4 shows one embodiment of the invention for performing a RCLKadjustment. The adjustment of the value for RCLK is similar to theprocess used to adjust the TCLK. It will be appreciated that RCLK biasmay be set at any value but for purposes of illustration, RCLK bias isset to its lowest value at operation 300. For example, TCLK may also beset at any value indicated in Matrix 1. At operation 310, a 10-kilobytememory test is then executed using the value established for RCLK andTCLK.

At operation 320, the feature of the program logic used herein is thatif the “memory passes” (e.g., if the first data sent from the MCH 20 tothe storage device 80 and the second data sent from the storage device80 to the MCH 20 match), then the second data is considered good. Thenat operation 330, the “passing” value of “1” is saved into memoryprovided that the RCLK value in the register is higher or lower than aprevious value in which memory passed.

The lowest passing value is referred to as RCLK_pass_low and the highestpassing value is referred to as RCLK_pass_high. It will be appreciatedthat these values change as the RCLK bias is changed to a lowest valuedifferent than previous RCLK bias values. At operation 340, the processis repeated until the termination criteria is met. At operation 350, thetermination criteria is met and the process is terminated.

It will be appreciated by those skilled in the art that although Matrix1 shown below is a 5×5 matrix, other sizes of matrices may be useddepending upon the number of TCLK bias values or RCLK bias values thatare used in the computer program. It will also be appreciated thatmemory tests other than the 10-kilobyte memory test may be used.

In order to better understand the features of the techniques describedherein, provided below is an example of a 5×5 matrix that is used indetermining the optimum RCLK and TCLK. Matrix 1 is empty to show thatthe process has not yet begun.

Matrix 1 Represents the RCLK of Clock 1 and the TCLK of Clock 2 inPicoseconds

Clock 1 Clock 2 −100 −50 0 50 100 −100 −50 0 50 100

For purposes of illustration, assume that five clock 1 values and fiveclock 2 values exist. For both clock 1 and clock 2, the five values areeach −100, −50, 0, 50, 100 ps. Accordingly, 25 combinations exist forclock 1 and clock 2. For clock 1, 100 ps means that the data istransmitted 100 ps earlier than usual. For clock 2, −100 ps means thatthe data receive window is shifted to 100 ps earlier. Clock 1 and clock2 are then set to −100 ps. In this example, a first data is then sentfrom MCH 20 to storage device 80 and stored in a buffer. A second datais then sent back from the buffer to MCH 20. Processor 30 then checksthe second data that was sent from storage device 80 and compares it tothe data that MCH 20 first sent to storage device 80. If the first datareturned from MCH 20 is “good” such that it matches the data that wasinitially sent from MCH 20 to storage device 80, a “1” is stored in the5×5 matrix cell for −100 ps for clock 1 and clock 2, respectively. Ifthe data is not “good”, a “0” is stored. Clock 1 is then changed to −50ps and the transmit receive and check cycle is repeated. Eventually,Matrix 1 is completely filled out as shown in Matrix 2.

Matrix 2 for RCLK of Clock 1 and TCLK of Clock 2 in Picoseconds

Clock 1 Clock 2 −100 −50 0 50 100 −100 0 0 0 0 0 −50 0 0 0 0 0 0 0 0 1 11 50 0 0 1 1 1 100 0 0 1 1 1

Since the range of “good” data is zero to 100 for both clock 1 and clock2, clock 1 may be set to 50 and clock 2 may be set to 50 and one cycleof one embodiment of the invention is complete. This process is repeateduntil the entire matrix is completely filled.

Worst Case Data Patterns Are Performed During A Read Or Write Cycle

In order to determine the boundaries of the data that “pass”, the datathat fails should be determined. Worst case data patterns may bedetermined during a read cycle or a write cycle by trying various valuesfor RCLK and TCLK as described above. Data that does not “pass” isassigned a zero as mentioned above. Practical experience indicates thatthe following pattern generally provides the worst case data patterns“101010”. It is to be appreciated that other data patterns mayconstitute the worst case data pattern in different systemconfigurations by providing the least amount of timing margin.

FIG. 5 illustrates another embodiment of the invention in the form of aflow chart and is similar to the embodiments shown in FIGS. 3 and 4. Inthis embodiment of the invention, two TCLK and two RCLK on MCH must bedetermined after data is sent between MCH and memory repeater hub rambus(“MRHR”). As noted above, the data is checked in a similar fashion asdescribed above except at least two Rambus channels are involved such asthat which is illustrated in FIG. 10B. At operation 400, the TCLK-mrhris set to the midpoint such as zero ps. At operation 410, the TCLK_mchis set to the lowest value that “passes.” The lowest value of TCLK“passes” when the first data sent from MCH 20 to memory repeater hubrambus (“MRHR”) and the second data sent from MRHR to MCH 20 is suchthat the second data is considered “good” (e.g., the second data matchesthe first data). Starting with the RCLK_mrhr value high, the RCLK_mrhris decreased until there is a failure or the limit as to that which isdesignated as “good” is met. At operation 420, the values of RCLK_mrhrvalue high and RCLK_mrhr are added for the lowest passing TCLK_mch (inpicoseconds) and RCLK_mrhr (in picoseconds) together, and that sum isstored in storage device 80. At operation 430, the TCLK_mch is set tothe highest passing value.

Starting with the lowest RCLK_mrhr value, the RCLK_mrhr value isincreased until there is a failure or the limit of that which is deemed“good” is achieved. At operation 440, the values are added for thehighest passing TCLK_mch (in picoseconds) and the RCLK_mrhr (inpicoseconds) which were increased. The sum for that addition is storedin storage device 80. At operation 450, the highest and lowest valuesare added together and divided by two to get the midpoint. At operation460, the TCLK values above and below the midpoint are determined. Atoperation 470, values of TCLK_midpoint_high and RCLK_mrhr orTCLK_midpoint_low and RCLK_mrhr are determined given the value closestto the midpoint. At operation 480, the procedure is repeated using thevalues established and described above for TCLK_mrhr and RCLK_mch. Atoperation 490, the termination criteria is met and the process is endeduntil the process is automatically restarted.

FIGS. 6-9 show in greater detail schematic illustrations of the clockpulses and signals emitted from DRCG 90. FIGS. 6 and 7 are blockdiagrams that show the clock pulses being emitted from DRCG clock 90.FIG. 6 is a block diagram illustrating storage device 80 in connectionwith MCH 20 in accordance with one embodiment of the invention. MCH 20has data lines that enters storage device 80. The data line also entersRDRAM 120. It will be appreciated that RDRAM may comprise up to 32RDRAM. It will also be appreciated that storage device 80 includes aclock with a clock signal generally of a sine wave that is adifferential of two clock signals in which one of the clock signals ishigh and the other clock signal is low. The sine wave occurs at thebackside of RDRAM. FIG. 7 illustrates the same block diagram as FIG. 6except FIG. 7 further shows the alternating sine waves being emittedfrom DRCG clock 90.

FIG. 8 illustrates a clock pulse generated by the DRCG clock on thesystem board 10 in accordance with one embodiment of the invention. Atclock crossing 140 which is, for example, 625 ps, data is launched. FIG.9 illustrates a clock pulse and point A wherein data is launched 625 psafter the clock crossing occurs.

FIGS. 10A-10C illustrates various embodiments of the invention whereinone or more channels are used in connection with MCH 20. FIG. 10Aillustrates MCH 20 coupled to RDRAM by direct channel 52. In thisembodiment of the invention, only one TCLK and RCLK must be adjusted.Adjusting the RCLK and the TCLK on MCH 20 may be problematic when MCH 20is coupled to RDRAM because it may not properly work with each RDRAM. Inthis embodiment, the RCLK and TCLK for MCH 20 are set to zero and theTCLK and the RCLK may be adjusted for each individual RDRAM.

FIG. 10B illustrates MCH 20 coupled to MRHR 42 by two direct channels(56, 58) that exit MRHR 42 and enter MCH 20, and channel 54 exits MCH 20and enters MRHR 42. In this embodiment of the invention, three channelseach have a TCLK and RCLK that may be adjusted. By having a plurality ofchannels, the computer system is capable of generally operating fasterand more efficiently because more data can be processed on morechannels.

FIG. 10C illustrates MCH 20 coupled to memory repeater hub SDRAM (MRHS)44 by direct channel 62. Direct channel 62 has one TCLK and one RCLKthat may be adjusted.

FIG. 11 illustrates another embodiment of the invention in which voltagereference (V_(reference)) is adjusted. In this embodiment, the TCLK andthe RCLK have been adjusted on MCH 20 and the TCLK and the RCLK havebeen adjusted on storage device 80 (or other suitable device) and thevoltage reference (V_(reference)) for DRCG clock 90 is automaticallyadjusted to its optimum value. The clock crossing is where data issampled. It will be appreciated that the high voltage (V_(high)) is at1.8 volts and the low voltage (V_(low)) is at 1.0 volts and theV_(reference) is set, for example, at 1.4. The V_(reference) is used todetermine the high and low voltages. V_(reference) may be adjusted up ordown. In this embodiment, the first current from MCH 20 is determinedwhen a first data is sent to storage device 80. A second current isdetermined when a second data is sent from storage device 80 to MCH 20.If the second current matches the first current, the current “passes”and a “1” is assigned to the “pass” and is stored in storage device 80.If the second current does not match the first current, the current“fails” and a “0” is assigned to the “fail” and is “0” a stored instorage device 80. A matrix similar to Matrix 1 is completed to a matrixsimilar to Matrix 2. The techniques of the claimed invention describedherein are implemented to determine the scope of that which is “good”data. Therefore, by implementing techniques described herein, theoptimum V_(reference) is determined.

The exemplary embodiments described herein are provided merely toillustrate the principles of the invention and should not be construedas limiting the scope of the subject matter of the terms of the claimedinvention. The principles of the invention may be applied toward a widerange of systems to achieve the advantages described herein and toachieve other advantages or to satisfy other objectives, as well.

What is claimed is:
 1. A method comprising: allowing a processor to execute instructions for improving a timing margin in a first integrated circuit (IC) die under test, wherein the execution (a) instructs the first IC die to set a relative phase of receive and reference clock signals at the first IC die, from a plurality of discrete, receive phase values, (b) instructs the first IC die to set a relative phase of transmit and reference clock signals at the first IC die, from a plurality of discrete, transmit phase values, (c) instructs the first IC die to drive a sequence of outgoing data symbols according to the transmit clock and receive a sequence of incoming data symbols according to the receive clock, at the relative phase settings of (a) and (b), and compares the outgoing symbols to the incoming symbols, and then (d) repeats (a)-(c) for other combinations of said plurality of transmit and receive phase values, and then (e) sets the relative phases, to values which are closest to yielding a balanced timing margin as determined from the results of the comparisons.
 2. The method of claim 1 wherein the plurality of discrete, transmit phase values are predetermined positive and negative clock bias numbers with respect to a zero clock bias.
 3. The method of claim 2 further comprising: allowing the processor to execute further instructions to repeat (a)-(c) for all other combinations of said plurality of transmit and receive phase values, prior to (e).
 4. An article of manufacture comprising: a machine-readable medium having instructions stored therein which, when executed by a processor, control a timing margin in an electronic system having first and second integrated circuit (IC) dies coupled to each other, the first IC die to drive a transmission line signal with a sequence of outgoing data symbols according to a transmit clock signal which is synchronized to a distributed clock signal, the first IC die to sample a transmission line signal to obtain a sequence of incoming data symbols according to a receive clock signal which is synchronized to the distributed clock signal, wherein execution of the instructions (a) sets a relative phase of the receive and the distributed clock signals, from a plurality of discrete, receive phase values, (b) sets a relative phase of the transmit and the distributed clock signals, from a plurality of discrete, transmit phase values, (c) instructs the first IC die to drive the sequence of outgoing data symbols and receive the sequence of incoming data symbols, at the relative phase settings of (a) and (b), compares the outgoing symbols to the incoming data symbols and records a result of the comparison, and then (d) repeats (a)-(c) for other combinations of said plurality of discrete, transmit phase values and discrete, receive phase values, and then (e) sets the relative phases, as in (a) and (b), to a pair of values, from said plurality of discrete, transmit and receive phase values, which are closest to yielding a balanced timing margin as determined from the results of the comparisons.
 5. The article of manufacture of claim 4 wherein the machine-readable medium includes further instructions which, when executed by the processor, store the results of the comparisons in an array of variables each to be assigned one of a pass value and a fail value, wherein a pass means that the sequences of incoming and outgoing symbols substantially match and a fail means that they do not, and wherein each variable refers to the result of a comparison for a different pair of said plurality of discrete, receive and transmit phase values.
 6. The article of manufacture of claim 5 wherein the machine-readable medium includes further instructions which, when executed by the processor, determine the largest timing margin by computing an average of the highest and lowest passing, discrete, transmit phase values, and an average of the highest and lowest passing, discrete, receive phase values in the array.
 7. An electronic system comprising: first and second integrated circuit (IC) dies coupled to each other via one or more data transmission lines, and a processor coupled to access the first and second IC dies, the first IC die to drive a transmission line signal, in one of the transmission lines, with a sequence of outgoing data symbols according to a transmit clock signal, the first IC die to derive the transmit clock signal from a distributed clock signal, which is distributed to the first and second IC dies, the first IC die to repeatedly sample a transmission line signal, from one of the transmission lines, to obtain a sequence of incoming data symbols according to a receive clock signal, the first IC die to derive the receive clock signal from the distributed clock signal, wherein the first IC die is to adjust (1) a relative phase of the transmit and distributed clock signals and (2) a relative phase of the receive and distributed clock signals, according to values stored in the first IC die and determined by the processor executing a program that evaluates data transfers between the first and second IC dies to obtain the values as yielding largest average timing margin for the data transfers.
 8. The electronic system of claim 7 further comprising a clock signal generator coupled to provide the first and second IC dies with the distributed clock signal via a pair of parallel traces formed in a printed wiring board and to which an external clock input of the first die is connected.
 9. The electronic system of claim 8 further comprising a termination circuit, wherein the clock signal generator is coupled to one end of the pair of traces and the termination circuit is coupled to another end of the pair of traces, and wherein the pair of traces is looped through the second IC die.
 10. The electronic system of claim 9 wherein the distributed clock signal is provided to the driver timing circuitry and to the receiver timing circuitry from upstream and downstream locations, respectively, on the pair of traces.
 11. The electronic system of claim 7 wherein the first IC die includes a dynamic random access memory storage array and the second IC die includes a memory controller.
 12. The electronic system of claim 7 wherein the first IC die includes a memory repeater and the second IC die includes a memory controller.
 13. A method comprising: allowing a processor to execute instructions for improving a timing margin in first and second integrated circuit (IC) dies, wherein the execution (a) sets a relative phase of receive and distributed clock signals at the first IC die, from a plurality of discrete, first receive phase values, (b) sets a relative phase of receive and distributed clock signals at the second IC die, from a plurality of discrete, second receive phase values, (c) instructs the second IC die to receive a sequence of outgoing data symbols according to the receive clock and the first IC die to receive a sequence of incoming data symbols according to the receive clock, at the relative phase settings of (b) and (a), respectively, and compares the outgoing symbols to the incoming symbols, and then (d) repeats (a)-(c) for other combinations of said plurality of first and second receive phase values, and then (e) sets the relative phases, as in (a) and (b), to values which are closest to yielding a balanced timing margin as determined from the results of the comparisons.
 14. The method of claim 13 wherein the plurality of discrete, first and second receive phase values are predetermined positive and negative clock bias numbers with respect to a zero clock bias.
 15. A method comprising: allowing a processor to execute instructions for improving a timing margin in first and second integrated circuit (IC) dies, wherein the execution (a) sets a relative phase of transmit and distributed clock signals at the first IC die, from a plurality of discrete, first transmit phase values, (b) sets a relative phase of transmit and distributed clock signals at the second IC die, from a plurality of discrete, second transmit phase values, (c) instructs the first IC die to drive a sequence of outgoing data symbols according to the transmit clock and the second IC die to drive a sequence of incoming data symbols according to the transmit clock, at the relative phase settings of (a) and (b), respectively, and compares the outgoing symbols to the incoming symbols, and then (d) repeats (a)-(c) for other combinations of said plurality of first and second transmit phase values, and then (e) sets the relative phases, as in (a) and (b), to values which are closest to yielding a balanced timing margin as determined from the results of the comparisons.
 16. The method of claim 15 wherein the plurality of discrete, first and second transmit phase values are predetermined positive and negative clock bias numbers with respect to a zero clock bias. 